Evaluating Performance and Portability of OpenCL Programs
نویسندگان
چکیده
Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices and therefore its capability should be clarified. In this paper, the performance of CUDA and OpenCL programs is quantitatively evaluated. First, several CUDA and OpenCL programs of almost the same computations are developed, and their performances are compared. Then, the main factors causing their performance differences is investigated. The evaluation results suggest that the performances of OpenCL programs are comparable with those of CUDA ones if the kernel codes are appropriately optimized by hand or by the compiler optimizations. This paper also discusses the differences between NVIDIA and AMD OpenCL implementations by comparing the performances of their GPUs for the same programs. The performance comparison shows that the compiler options of the OpenCL C compiler and the execution configuration parameters have to be optimized for each GPU to obtain its best performance. Therefore, automatic parameter tuning is essential to enable a single OpenCL code to run efficiently on various GPUs.
منابع مشابه
Improving Performance Portability in OpenCL Programs
We study the performance portability of OpenCL across diverse architectures including NVIDIA GPU, Intel Ivy Bridge CPU, and AMD Fusion APU. We present detailed performance analysis at assembly level on three exemplar OpenCL benchmarks: SGEMM, SpMV, and FFT. We also identify a number of tuning knobs that are critical to performance portability, including threads-data mapping, data layout, tiling...
متن کاملAutomatic Translation of Cuda to Opencl and Comparison of Performance Optimizations on Gpus
As an open, royalty-free framework for writing programs that execute across heterogeneous platforms, OpenCL gives programmers access to a variety of data parallel processors including CPUs, GPUs, the Cell and DSPs. All OpenCL-compliant implementations support a core specification, thus ensuring robust functional portabiity of any OpenCL program. This thesis presents the CUDAtoOpenCL source-to-s...
متن کاملPerformance Portability in Accelerated Parallel Kernels
Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for...
متن کاملCompiler and Runtime Techniques for Bulk - Synchronous Programming Models on Cpu Architectures
The rising pressure to simultaneously improve performance and reduce power consumption is driving more heterogeneity into all aspects of computing devices. However, wide adoption of specialized computing devices such as GPUs and Xeon Phis comes with a programming challenge. A carefully optimized program that is well matched to the target hardware can run many times faster and more energy effici...
متن کامل